Hysteresis effects of changing parameters of noncooperative games 
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We adapt the method used by Jaynes to derive the equilibria of statistical physics to instead derive equilibria 
of bounded rational game theory. We analyze the dependence of these equilibria on the parameters of the 
underlying game, focusing on hysteresis effects. In particular, we show that by gradually imposing individual- 
specific tax rates on the players of the game, and then gradually removing those taxes, the players move from a 
poor equilibrium to one that is better for all of them. 
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INTRODUCTION 

The Maximum Entropy (Maxent) principle is an 
information-theoretic formalization of Occam's razor. It 
says that if we are given the expectation values of some 
functions of a system's state, then we should predict that the 
associated distribution is the one with minimal information 
(i.e., maximal entropy) consistent with those expecta- 
tions (T][2). Maxent provides a succinct way to derive much 
of statistical physics (21EI, e.g., the canonical ensemble. 

Noncooperative game theory [5-8 1 is the foundation of con- 
ventional economics. It uses provided utility functions of a set 
of human "players" to predict how the players will model one 
another. It then uses this to predict the players' joint behavior. 

Many recent applications of statistical physics to economics 
analyze it at a coarse-grained level, bypassing its game- 
theoretic foundation. Here we build on J9] and apply Maxent 
to game theory, thereby introducing statistical physics tech- 
niques into the foundation of economics. 

In this application of Maxent, there is a separate expecta- 
tion value for each player. In contrast, when applying Maxent 
to derive the canonical ensemble, there is a single expecta- 
tion value (of the system's energy). Accordingly, rather than 
the canonical ensemble's single Boltzmann distribution, in- 
volving a single Hamiltonian and a single "temperature", we 
derive a separate Boltzmann distribution for each player, in- 
volving only that player's utility function, and a "temperature" 
unique to that player. The players' Boltzmann distributions 
are coupled, and the joint solution provides a bounded ratio- 
nality version of the Nash equilibrium (NE) of game theory, 
where each player's inverse temperature quantifies their ratio- 
nality. 

We analyze the dependence of this modified NE on the 
parameters of the underlying game, focusing on bifurcation 
behavior and hysteresis effects. In particular, we show how 
by gradually imposing taxes on the players, and then grad- 
ually removing them, the joint behavior of the players can 
be moved from a poor equilibrium to a Pareto-superior one. 
(This can even be done if we require that the players agree to 
each infinitesimal change in tax rates, since each such change 
increases every player's expected utility.) This is particularly 
interesting given estimates that non-OECD countries could in- 



crease their wealth by one third by moving from their current 
equilibrium to a different one. Next we introduce three toy 
models of how a society can modify tax rates: via "social- 
ism", a "market", or "anarchy". We then compare these three 
models in terms of the associated discounted sum of total util- 
ities along the path of tax rates. 



BACKGROUND 

Many different axiomatic arguments establish that the 
amount of syntactic information in a distribution P(y) in- 
creases as the Shannon entropy of that distribution, S(P) = 
- Yjy P(y)ln[P(y)] HJ El El > decreases. This provides a way to 
formalize "Occam's razor": given limited prior data concern- 
ing P(y), predict P(y) is the distribution with minimal infor- 
mation (maximum entropy) consistent with that data. This 
formalization of Occam's razor is called the maximum en- 
tropy principle (Maxent). When the data concerning P(y) is 
expectation values of functions under P, Maxent has proven 
extremely accurate in domains ranging from signal processing 
to supervised learning \2\. Jaynes used it to derive statistical 
physics (2), e.g., having the data be the expected energy of a 
system or its expected number of particles of various types. 

A finite, strategic form noncooperative game consists of a 
set of N players, where each player i has her own set of al- 
lowed pure strategies Xj of size < oo. A mixed strat- 
egy is a distribution <7,(x,) over Xj. The joint distribution over 
X = X; %i is q{x) = Wi qi(xi), and is called a strategy profile. 

Each player i has a utility function m, : X — > R. So 
given strategy profile q, the expected utility of player i is 
E(k,-) = zZx n./ 9Mj)Ui(x) where q-i(x-d = tlj*t4Mj)- The 
Nash equilibrium (NE) is the strategy profile defined by hav- 
ing every player i set to maximizes Fs expected utility, i.e., 



V/, q, = argmax f/ . 



In general, this set 



Tjx q'i(xdq-i(x-i)Ui(x) 
of coupled equations has multiple solutions. 

A well-recognized problem of using the NE to predict real- 
world behavior is its assumption that every player chooses 
their optimal mixed strategy, which is called full rationality. 
This assumption is violated (often badly) in many experimen- 
tal settings ifTOlfTTI . Our modified NE derived using Maxent 
accommodates such bounded rationality. 
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MAXENT AND QUANTAL RESPONSE EQUILIBRIA 

To predict what q the players in a given jV-player game 
r will adopt, first pick one of the players, i. Consider a 
counter-factual situation, where i has the same move space 
and utility function as in T, but rather than have a set of 
N — I other humans set the distribution over X-i, an inani- 
mate stochastic system sets that distribution, to some 
In general, due to her limited knowledge of limited com- 
putational power, etc., i will choose a suboptimal qi, i.e., 
qi i argmax [E P;g _,.(«;)]■ To quantify this bounded rationality, 
in analogy to Jaynes' derivation of the canonical ensemble, 
constrain q, so that E 9i>9 has some (nonmaximal) value 
for the given Then Maxent says 

qi(Xi) oc exp IPjBq^Ui | Xi)]. (1) 

where is the Lagrange parameter enforcing the constraint. 
Note that as p, — > oo, i becomes increasingly rational, whereas 
as Pi — > 0, she becomes increasingly irrational. 

Next, recall that by the axioms of utility theory fl2l . all 
that player i is concerned with in choosing her mixed strategy 
is the resultant expected utility. Accordingly, we presume that 
if the best i can do is choose a particular g, when is set 
by an inanimate system, she would also choose qi if she faces 
that same distribution <7_, when it is set by other humans. 

Generalizing, Maxent says that Eq.[T|should hold simulate- 
nously for all N players i, with player-specific Lagrange pa- 
rameters. This gives a set of N coupled non-linear equations 
for q. Brouwer's fixed point theorem jT3| guarantees that set 
always has a solution, and in general it has more than oneQ 

This prediction for q is not based on a model of bounded 
rational human behavior derived from experimental data. It 
is based on desiderata concerning the prediction process, not 
on a model of the system being predicted. Nonetheless, it 
is intriguing to note that maximizing Shannon entropy has a 
natural interpretation in terms of a common model of human 
bounded rationality, involving the cost of computation. To see 
this, recall that —S {qi) measures the amount of information in 
the distribution q/. Say we equate the cost to i of computing q, 
with this amount of information. Then under Maxent, player 
; minimizes the cost of computing her mixed strategy, subject 
to a constraint for the value of her expected utility that acts 
as an "aspiration level" . Under this interpretation, p, quanti- 
fies i's cost of computing q/, in units of expected utility. Future 
work involves incorporating experimental data concerning hu- 
man behavior as additional constraints in the Maxent. (Other 
models of the cost of computation can be found in |[T4lLT8l .) 

Solutions to our N coupled equations for q are typically 
called "logit Quantal Response Equilibria" (QRE) in game 



An alternative Maxent approach would use it to set the entire joint distribu- 
tion q(x) = n, at once, rather than use it to set each q\ separately and 
then impose self-consistency. However there are difficulties in choosing 
what constraints to use under this approach. See 1 9 1 . 
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FIG. 1. E(u co i) vs. p under the QRE of the game in Eq. [3] The 
hysteresis path involving bonds discussed in the text is highlighted. 

theory |[T9l42"2"l . They have been independently suggested sev- 
eral times as a way to model human players |[T4ll23l427l . In all 
this earlier work the logit distribution is not derived from first 
principles^] Nor is it related to information theory, or the cost 
of computation. Rather typically the logit QRE has been used 
as an ad hoc, few-degree of freedom model of bounded ratio- 
nal play. As such it has been widely and successfully used to 
fit experimental data concerning human behavior]^] 

THE SHAPE OF THE QRE SURFACE 

To analyze the QRE surface of Eq. [T] we express it as 
a set of functional relationships, qi - fi(q-i,Pd, = 
f-i(qi,/3-i). A bifurcation may occur if for some i 

djj_ dj-i_ dqt + dfi_ _ % _ 
dq-t dq; dpi dPi OPi 

cannot be solved for -M, i.e., if det(^ -jj^ - Id) = 0. To illus- 
trate this and related phenomena, we consider games between 
a Row and Column player, each with two pure strategies. The 
first is the famous "battle of the sexes" coordination game [5 1, 
where the utility functions are 

2|1 0|0 

0|0 l\2 (3) 

where the first (second) entry in each cell is the Row (Col- 
umn) player's utility for the associated pure strategy profile. 
Each joint inverse temperature p = (p rmv ,P column) fixes QRE 
g's for this game, and therefore QRE expected utilities. Fig.[T] 
plots this surface taking p to E ? (h c0 /). At bifurcations the num- 
ber of QRE solutions changes between one and three, and in- 
finitesimal changes in p may result in discontinuous changes 



2 The QRE literature justifies the logit distribution by appealing to choice 
theory |28 |, where it arises if double-exponential noise is added to player 
utility values. However that double-exponential noise assumption is never 
axiomatically justified in choice theory; it is assumed for the calculational 
convenience that it results in the logit distribution. 

The logit distribution in Eq. [I] also arises in Reinforcement Learning [29— 
1321 . as a way to design artificial agents that learn from experience. 
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FIG. 2. The expected utility of the Row player along the path through 
/? highlighted in Fig. [I] illustrated as a function of the Row player's 
rationality, /?,.„„,. The path starts at the bottom right, then travels left, 
before turning and finishing at the top right. Even though the Row 
player ultimately benefits if society follows this path, at the begin- 
ning they lose expected utility. They may demand to be compensated 
for that initial drop, e.g., with proceeds of a bond that are paid off by 
both players when the end of the path is reached. 



in expected utility. (E.g., this happens if the system starts at 
/3 = (5, 5) on the top surface, and then f3 row is reduced to 0.) 

An interesting effect occurs if we multiply the utilities by 
— 1, Fig. [3] illustrates part of the surface after this switch. 
Note that on the bottom fold, for fixed /3 co i, decreasing /3 rmv 
increases B(u row ). So Row benefits by being less rational, due 
to how Column responds to Row's drop in rationality. 

By Eq. [T] changing affects the QRE q the same way as 
keeping f3\ fixed but multiplying m, by some a,. So Fig.'s[T][3] 
give QRE surfaces where /? is fixed, but each m, is multiplied 
by a,. (Formally, reinterpret the x and y axes as a row and 
a co i rescaled, and reinterpret the z axis as E 9 («,-)/a,-.) Note 
we can interpret 1 - a, as a tax rate on player i. So if we 
model rationalities as fixed, e.g., as behavioral attributes, 
then on the bottom surface in Fig. [3] Row benefits if her tax 
rate increases. 

The fact that Row may prefer a higher tax rate suggests 
that by varying tax rates "adiabatically" slowly, so that the 
joint behavior of the players is always on the QRE surface, 
we may be able to montonically improve expected utilities for 
both players. Indeed, by changing tax rates we can gradually 
move the equilibrium across the surface from one fold to the 
other, and then undo those changes, returning the rates to their 
original values, but leaving both players with higher expected 
utility. (See [33] for other work that exploits the shape of a 
QRE surface to optimize player joint behavior.) 

More precisely, there are paths of /3's (i.e, of a's) such that: 

1. Neither player ever is more rational (taxed at a higher 
rate) on the path than at the starting point. 

2. At each step on the path, if after the next infinitesimal 
change in /3 there is a QRE q infinitesimally close to the 



current one, it is adopted. (Adiabaticity.) 

3. Each infinitesimal change in /3 increases both E ? («j)'s. 

4. At each infinitesimal step, if multiple changes in q meet 
(l)-(3), but one is Pareto superior to the others (i.e., bet- 
ter for both players), the players coordinate on that one. 

Examples of such paths are illustrated in Fig. [3] 

The existence of such paths raises the question of how a 
society should dynamically update its tax rates. We now com- 
pare three procedures for how this could be done by society as 
a whole. (For notational simplicity, and to emphasize the anal- 
ogy with annealing, we parameterize the procedures in terms 
of their action on (3 rather than on 1 - a.) 

I. "Anarchy": Players independently decide how to mod- 
ify their /3's. To do this they follow gradient ascent 
with a small step size A, subject to the constraint that 
no player i can go to a /?,■ larger than the starting one. 
Thus, both players i change by 5/3; e [-A, A], us- 
ing dE(ui)/d/3i to make their choice of what value 5/3/ to 
pick. (Since this is a linear procedure, the players will 
always choose one of the three values {-A, 0, A}.) 

II. "Socialism": An external regulator determines the path, 
again using gradient descent, this time over the sum of 
the players' expected utilities. At each step of the path 
{3 is changed by the (8/3 row , S/3 co i) vector that maximizes 

dWu row ) dE(u row ) dE(u co[ ) dE{u co i) 

[ofi m v +0/3 col J + [0/3 row +o/3 C ol J 



d/3 c , 



dPcoi 



III. 



subject to \\(S/3 r0 w, 8ficoi)\\ 2 ^ 2A 2 . (The constraint is to 
match the step size to that of the first procedure.) 

"Market": Certain mild axioms concerning bargaining 
behavior of humans give a unique prediction for what 
bargain is reached in any bargaining scenario. Let T be 
the set of joint expected utilities for all the bargains that 
a set of N bargainers might reach in a particular bargain- 
ing scenario. Then the "Nash bargaining concept" SO 
predicts that the the joint expected utility of the bargain 
reached is argmax^j- []"[;! i «/]• 

We can use the Nash bargaining concept to predict what 
change to f3 the players would agree to under a "market" 
where they bargain with one another to determine that 
change. To do this we fix the set of all allowed bargains 
to the set of all pairs /3 such that 1^8-/3(01 1 2 < 2A 2 , where 
/3( f) is the current joint f3. We also choose d to be the 
joint expected utility at /3(t). So under Nash bargaining, 
at each iteration t, the players choose the change in joint 
/3, 60, that maximizes the product 



urow \ p(t) + b~p) — E(ur ow \0(t)) 
E(u C oi\P{t) + 5p)-E(u C oi\ht)) 




FIG. 3. A QRE surface with paths shown for the anarchy (red), so- 
cialism (blue) and market (purple) procedures. As in Fig.[T| the x and 
y axes are player rationalities, fi row and /? co ;, and the z axis is expected 
utility (this time of player Row). 

subject to \\6fi\\ < 2A 2 . As in the other two procedures, 
we use first order approximations in this one, to evaluate 
the two differences in expected utilities. 

In all three procedures the total change in f3 in any step never 
exceeds V2A. This adiabaticity reduces the computational 
burden on the players, by not changing the game too much 
from one timestep to the next. (Similar assumptions are called 
comparitive statics in economics H34ll .) 

As in standard economics, we can quantify how good a full 
path produced by a procedure is for society as a whole by cal- 
culating the discounted sum of future utilities along the path, 

N 

f>0 i=l 

So we can compare the three procedures by calculating the 
Q's for the paths they generate starting from some shared [3 
at time t — 0. We did this for several representative initial 
jS's for the surface in Fig. [3] Anarchy always did worse than 
the other two procedures. Those others are compared to each 
other in Fig. [4] When the discounting factor y is large (i.e., 
we are more concerned with near-term than long-term utility) 
the market procedure does better, otherwise socialism does. 

All three procedures are local, looking only a single step 
into the future. A procedure that also considers the QRE sur- 
face's global geometry will produce better paths in general. In 
particular, such global information allows us to consider paths 
where a player loses expected utility for certain periods, but 
in the end all players are better off. Fig. [T] highlights such a 
path, along which player Column always benefits but player 
Row loses initially, before ultimately benefitting. (A cross- 
section of the expected utility of Row along the path is shown 
in Fig. [2] ) Note that player Row might demand compensation 
to agree to follow such a path where they temporarily lose ex- 
pected utility,, e.g. in terms of a subsidy paid for with a bond 
that is repaid by all players at the end of the path. 



FIG. 4. The difference between the discounted sums of future ex- 
pected utilities of the two players under the "socialism" and "market" 
procedures, plotted against the discounting factor 7. 



Particularly interesting issues arise when setting full paths 
under the market model, if the players use discounted sum 
of future utilities to value full paths. For example, say that 
at t = society starts to follow a path [3o(t) that is a Nash 
bargaining solution then. Then in general, for f > 0, the path 
f3 t '{t) that is a Nash bargaining solution for full paths starting 
from /?()(?') is not a truncation of /3o(t) to t > f . There is an 
inconsistency across time. This raises many interesting issues 
concerning binding commitments, what it means for a path 
chosen by bargaining to be renegotiation-proof, etc. 

Multiple folds will exist for the QRE surfaces involving 
many kinds of game parameters, not just tax rates. Often such 
parameters will be set externally, perhaps in a noisy process. 
When this is the case, the QRE surface tells us how stable 
player behavior is against that external noise. For example, 
say the players are on the top fold of the surface in Fig. [T] 
with f3 = (2,4), so the joint behavior is near an edge of the 
QRE surface. In this situation, small external noise may lead 
the players to "fall off the edge", and undergo a discontinu- 
ous jump to the lower surface. Moreover, even if the players 
managed to (adiababitically slowly) restore their original ra- 
tionalities after such a jump, they would end up on the middle 
fold of the region where /3 roH . is near 2, not on the good fold 
they started in. Due to this, when an economic situation ex- 
hibits such qualitative features, it may behoove society to stay 
away from such edges in the QRE surface, even if that lowers 
total expected utility. 
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